Short Text Classification with Tolerance-Based Soft Computing Method

نویسندگان

چکیده

Text classification aims to assign labels textual units such as documents, sentences and paragraphs. Some applications of text include sentiment news categorization. In this paper, we present a soft computing technique-based algorithm (TSC) classify polarities tweets well categories from text. The TSC is supervised learning method based on tolerance near sets. Near sets theory more recent methodology inspired by rough where instead set approximation operators used induce classes, the classes are directly induced feature vectors using level parameter distance function. proposed takes advantage advances in efficient extraction vector generation pre-trained bidirectional transformer encoders for creating classes. Experiments were performed ten well-researched datasets which both short long Both SBERT TF-IDF experimental analysis. Results transformer-based demonstrate that outperforms five well-known machine algorithms four datasets, it comparable with all other weighted F1, Precision Recall scores. highest AUC-ROC (Area under Receiver Operating Characteristics) score was obtained two six datasets. ROC-PRC Precision–Recall Curve) one dataset Additionally, significant differences observed most comparisons when examining statistical difference between F1-score classifiers Wilcoxon signed-ranks test.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Ultra-high Resolution Images using Soft Computing

The main objective of this paper is to use a computational intelligence algorithm for preparing a mapping map that categorizes different patterns of identification of infected areas and changes in radiation pollution. In this paper, the use of the fuzzy inference system has been proposed to determine the degree of radiation contamination in the regions. The study uses ultra-high resolution spec...

متن کامل

Short Text Classification Based on Improved ITC

The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selec...

متن کامل

New Method for Sentiment Classification for Short Text

With the rapid development of the Internet, the microblog platform, BBS, e-Commerce etc. gathered a lot of short messages/text, which contained subjective sentences. These sentences often had obvious inclination which reflected the sentiment of the author. By mining the author’s sentiment, such as like, angry, indignation, averseness, etc., we can analyze people’s opinion for some policy, peopl...

متن کامل

An Effective and Robust Method for Short Text Classification

Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this work we use prediction by partial matching (PPM), a method that compresses texts to capture text features and creates a language model adapted to a particular text. We show that the method achieves a high accuracy of te...

متن کامل

Intellectual Capital Evaluation with Soft Computing Method

The problem of evaluation of intellectual capital has attracted considerable attention. It involves many factors, such as peoples’ utility, the market value and the potential profits etc. Using the traditional accounting methods is difficult to estimate an appropriate result. In this paper we propose an integrated fuzzy evaluation procedure to measure the intellectual capital. The main methods ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2022

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a15080267